The file titled US Electricity.csv includes a time series index compiled by the US Federal Reserve representing total fossil-fuel US electricity generation by all utilities from January 1939 through October 2021.
In the following code box we read the CSV file and set up the data as a tsibble and then we plot it and subset it to examine it.
library(fpp3)
D <- read.csv("US Electricity.csv") %>%
mutate(DATE = yearmonth(DATE)) %>%
as_tsibble(index = DATE)
D %>% autoplot(ELEC)
DR <- D %>% filter(DATE >= yearmonth("2010 Jan"))
DR %>% autoplot(ELEC)
We are interested in developing a two-year long monthly forecast (24 months) for the national electricity production requirements.
The first step in fitting an ARIMA model to model a time-series is to determine if the time series is stationary and if it is not, determine how many differences is neccessary to make it stationary.
DR %>%
mutate(diff.E = difference(ELEC),
diff2.E = difference(diff.E)) -> DE
# Examine Stationarity Visually
DE %>% gg_tsdisplay(ELEC, plot_type = "partial")
DE %>% gg_tsdisplay(diff.E, plot_type = "partial")
DE %>% gg_tsdisplay(diff2.E, plot_type = "partial")
From the visual examination of the plots it is apparent that the ELEC time-series is not stationary, but the first and second difference may be.
Next we examine ADF and the unit roots test results on these three time series.
The Null Hypothesis of the KPSS root test is that data is stationary, hence we need to obtain a large value of p so we CANNOT reject the Null Hypothesis
On the other hand the Null Hypothesis of the ADF test is that the data is NOT stationary, hence we need to obtain a small p-value to reject the Null Hypotehsis.
# Unit Root Test
DE %>% features(ELEC, unitroot_kpss)
DE %>% features(diff.E, unitroot_kpss)
DE %>% features(diff2.E, unitroot_kpss)
DE %>% features(ELEC, unitroot_ndiffs)
# ADF Test
DE$ELEC%>% adf.test()
Augmented Dickey-Fuller Test
data: .
Dickey-Fuller = -3.8581, Lag order = 5, p-value = 0.01823
alternative hypothesis: stationary
DE$diff.E %>%
na.omit() %>%
adf.test()
Warning in adf.test(.) : p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: .
Dickey-Fuller = -7.3944, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary
DE$diff2.E %>%
na.omit() %>%
adf.test()
Warning in adf.test(.) : p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: .
Dickey-Fuller = -14.314, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary
According to the KPSS test we need to difference ELEC twice before fitting the model. On the other hand the ADF test indicates that we may be able to fit a model to the date differencing it once or twice.
Based on ACF and PACF plots, following 3 models are recommended:
There is yearly seasonality in the data, and 3 strong PACF terms. Seasonally differencing once with m=12 should be a good model for this data.
m <- DE %>% model(m1 = ARIMA(ELEC ~ pdq(3,0,0) + PDQ(1,1,0)),
m2 = ARIMA(ELEC ~ pdq(2,0,0) + PDQ(1,1,0)),
m3 = ARIMA(ELEC ~ pdq(1,0,0) + PDQ(1,1,0)),
m4 = ARIMA(ELEC),
m5 = ETS(ELEC))
m %>% glance() %>%
select(.model, AICc, BIC)
Here, m1 is ARIMA model ‘[3,0,0][1,1,0] m=12’ m2 is ARIMA model ‘[2,0,0][1,1,0] m=12’ m3 is ARIMA model ‘[1,0,0][1,1,0] m=12’ m4 is ARIMA auto m5 is ETS
m %>% augment() %>%
features(.resid, ljung_box, lag = 10)
m %>% select(m1) %>% gg_tsresiduals()
m %>% select(m2) %>% gg_tsresiduals()
m %>% select(m3) %>% gg_tsresiduals()
m %>% select(m4) %>% gg_tsresiduals()
m %>% select(m5) %>% gg_tsresiduals()
The analysis above implies that we cannot reject the residual independence hypothesis for ARIMA the models. The high p values of the ARIMA models indicate that the residuals are uncorrelated.
m %>% glance() %>%
select(.model, AICc, BIC)
Here, m1 is ARIMA model ‘[3,0,0][1,1,0] m=12’ m2 is ARIMA model ‘[2,0,0][1,1,0] m=12’ m3 is ARIMA model ‘[1,0,0][1,1,0] m=12’ m4 is ARIMA auto m5 is ETS
m %>% select(m1) %>% report()
Series: ELEC
Model: ARIMA(3,0,0)(1,1,0)[12]
Coefficients:
ar1 ar2 ar3 sar1
0.4617 0.0471 0.0611 -0.2317
s.e. 0.0875 0.0975 0.0887 0.0891
sigma^2 estimated as 13.15: log likelihood=-347.66
AIC=705.32 AICc=705.81 BIC=719.62
m %>% select(m2) %>% report()
Series: ELEC
Model: ARIMA(2,0,0)(1,1,0)[12]
Coefficients:
ar1 ar2 sar1
0.4665 0.0767 -0.2400
s.e. 0.0873 0.0877 0.0879
sigma^2 estimated as 13.09: log likelihood=-347.9
AIC=703.79 AICc=704.12 BIC=715.23
m %>% select(m3) %>% report()
Series: ELEC
Model: ARIMA(1,0,0)(1,1,0)[12]
Coefficients:
ar1 sar1
0.5054 -0.2361
s.e. 0.0754 0.0878
sigma^2 estimated as 13.06: log likelihood=-348.28
AIC=702.56 AICc=702.75 BIC=711.13
m %>% select(m4) %>% report()
Series: ELEC
Model: ARIMA(1,0,0)(2,1,0)[12]
Coefficients:
ar1 sar1 sar2
0.3757 -0.3375 -0.4386
s.e. 0.0877 0.0834 0.0870
sigma^2 estimated as 10.76: log likelihood=-337.76
AIC=683.53 AICc=683.85 BIC=694.97
m %>% select(m5) %>% report()
Series: ELEC
Model: ETS(M,N,A)
Smoothing parameters:
alpha = 0.2924178
gamma = 0.0001000089
Initial states:
l[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6] s[-7]
101.0318 7.817185 -6.612555 -11.00125 -2.759789 8.873747 10.47588 -0.3751617 -11.78396
s[-8] s[-9] s[-10] s[-11]
-13.88839 -2.460821 6.012828 15.70228
sigma^2: 9e-04
AIC AICc BIC
1019.152 1022.992 1063.383
On the basis of the AICc values of the 5 models, we can conclude that the ARIMA auto model ([1,0,0][2,1,0]m=12) performs the best overall followed by the 2 manual ARIMA models - [1,0,0][1,1,0]m=12 and [2,0,0][1,1,0]m=12
For 5: For model cross-validation purposes stretch the DR data as follows:
D.CV <- DR %>%
filter(DATE >= yearmonth("2010 Jan")) %>%
stretch_tsibble(.init = 36, .step = 1)
mC <- D.CV %>%
model(arima200110 = ARIMA(ELEC ~ pdq(2,0,0) + PDQ(1,1,0)),
arima100110 = ARIMA(ELEC ~ pdq(1,0,0) + PDQ(1,1,0)),
arima_100210= ARIMA(ELEC ~ pdq(1,0,0)+ PDQ(2,1,0)),
ETS_auto = ETS(ELEC ~ error("M") + trend(N) + season("A")))
mC